Ordering Categorical Data to Improve
نویسندگان
چکیده
| Visualization provides a means for exploratory analysis of large scale, complex data. In domains such as network management, these data often have categorical attributes, such as host names and event types. Unfortunately, large scale visualiza-tions of categorical data are diicult to construct since categorical values have no inherent order. We consider two visual tasks: nding groups of similar values of a single categorical attribute and nding relationships between values of diierent categorical attributes. To assist in these tasks, we develop an algorithm that orders categorical values using the following steps: (1) construct natural clusters of categorical values based on domain semantics (e.g., group together hosts that emit events at the same time); (2) order the clusters; and (3) order the categorical values within each cluster. We demonstrate the beneets of this approach by applying it to scatter plots and parallel coordinate plots of event data collected from a corporate Intranet. I. Introduction Visualization has become increasingly important for the analysis and exploration of multidimensional data (e.g., 5]]1] 3]]6]). While a great deal of work has addressed visualization of numeric data, many domains require visualization of large amounts of categorical data. Examples of such data are: city names in census data, equipment manufacturers in inventory data, and host names in network management data. Unlike numeric data, categorical values do not have an order. This is problematic for commonly-used visu-alization techniques, such as scatter plots and parallel coordinate plots, since categorical values need to be mapped to axis coordinates. Technically, any order of the categorical values is valid. However, as we will show later using data collected from a corporate In-tranet, properly ordering the data can greatly improve the quality of the visualizations. Although ordering (or permuting) categorical values is known to be important for visual analysis of categorical dataa6], we are aware of only two approaches that have been applied to plots. The rst orders values manually. This requires a knowledgable end-user, and
منابع مشابه
A Cluster Based MARDL Algorithm for Drifting Categorical Data
Clustering is an important problem in data mining. Most of the earlier work on clustering focused on numeric attributes which have a natural ordering on their attribute values. Recently, clustering data with categorical attributes, whose attribute values do not have a natural ordering, has received some attention. However, previous algorithms do not give a formal description of the clusters the...
متن کاملA Divisive Ordering Algorithm for Mapping Categorical Data to Numeric Data
The amount of computing time for K Nearest Neighbor Search is linear to the size of the dataset if the dataset is not indexed. This is not endurable for on-line applications with time constraints when the dataset is large. However, if there are categorical attributes in the dataset, an index cannot be built on the dataset. One possible solution to index such datasets is to convert categorical a...
متن کاملClustering Numerical and Categorical Data
Clustering is an important technique for data mining which allows us to discover unknown relationships in our data sets. Clustering algorithms that use metrics based on the natural ordering of numbers cannot be applied to categorical (non-numerical) data. In this tutorial we will review the main methods for numerical data clustering (K-Means, Hierarchical Clustering and Fuzzy CMeans) and then s...
متن کاملThe logic of interpreting evidence of developmental ordering: strong inference and categorical measures.
Developmental ordering is a fundamental prediction of developmental theories and a central issue in developmental research. However, logically sound evidence of developmental ordering is difficult to obtain. This article analyzes the logical basis of testing developmental order hypotheses with categorical measures. Depending on whether saltatory (i.e., discrete) or continuous developmental chan...
متن کاملEfficient layered density-based clustering of categorical data
A challenge involved in applying density-based clustering to categorical biomedical data is that the "cube" of attribute values has no ordering defined, making the search for dense subspaces slow. We propose the HIERDENC algorithm for hierarchical density-based clustering of categorical data, and a complementary index for searching for dense subspaces efficiently. The HIERDENC index is updated ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999